Emergence of Zipf 's Law in the Evolution of Communication 
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Zipf s law seems to be ubiquitous in human languages and appears to be a universal property of 
complex communicating systems. Following the early proposal made by Zipf concerning the pres- 
ence of a tension between the efforts of speaker and hearer in a communication system, we introduce 
evolution by means of a variational approach to the problem based on Kullback's Minimum Dis- 
crimination of Information Principle. Therefore, using a formalism fully embedded in the framework 
of information theory, we demonstrate that Zipf 's law is the only expected outcome of an evolving, 
communicative system under a rigorous definition of the communicative tension described by Zipf. 
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I. INTRODUCTION 

Zipf s law is one of the most common power laws found 
in nature and society [THS]. Although it was early ob- 
served in the distribution of money income [7 and city 
sizes ^ , it was popularized by the linguist George Kings- 
ley Zipf, who observed that it accounts for the frequency 
of words within written texts [2] [3]. Specifically, if we 
rank all the occurrences of words in a text from the most 
common to the least, Zipf's law states that the probabil- 
ity q{sm) that in a random trial we find the m-th most 
common word {i = 1, n) falls off as 



q{sm) 



1 



-m 



where 



j<n 

with 7^1. The ubiquity of this scaling behavior sug- 
gested several mechanisms to account for the emergence 
of this distribution, among many others, see jH [8UT2]. 

Within the context of human language, G. K. Zipf 
early conjectured that this scaling law is the outcome of 
a tension between two forces acting in a communication 
system [3 . Following Zipf's proposal, speakers and hear- 
ers need to simultaneously minimize their efforts, under 
what he called vocabulary balance^ a particular case of the 
so-called Principle of Least Effort. This triggers a ten- 
sion between the two communicative agents, while trying 
to simultaneously minimize their efforts. The speaker's 
economy would favour a reduction of the size of the vo- 
cabulary to a single word whereas the hearer's economy 
would lead to an increase of the size of a vocabulary to 
a point where there would be a different word for each 
meaning. The resulting vocabulary would emerge out 
of this unification-diversification conflict [3 . Although 
both numerical and theoretical studies have explored this 
idea [TOl [11] |T3] , no truly analytic proof of unicity has 
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FIG. 1: A growing communication system. In (a) possible 
meaning-signal associations made by the coder module P in 
which eq. ^ holds is depicted. In (b) we summarize the 
evolution rules of our communicative system. Suppose that 
symmetry between coder and decoder -i.e., eq. ([2|- holds for 
the step n (above). At each step (below) a new element is 
added to the set Q and eq. ^ holds again for this new con- 
figuration. Furthermore, the new configuration is constrained 
by the MDIP, which introduces a path dependency in the 
evolutionary process. 



been provided under realistic, information-theoretic con- 
straints. We can view the proposals made in [101 El [El 
as static for they consider a fixed size of the code. 

A recent approach -which goes beyond the communica- 
tive framework- defined the key complexity properties 
of a system to display a statistics of events following 
Zipf's law: An open, unbounded number of accessible 
states and a linear loss of entropy due to generic inter- 
nal constraints [12 . The linear loss of entropy grasps 
the intuitive idea that the studied systems are in an in- 
termediate state between order and disorder -or that a 
possible informative tension is balanced, as we shall see- 
and the unbounded number of accessible states reflects 



2 



their open nature. It was shown that, under a very gen- 
eral parametrization, and imposing properties of scale- 
invariance to the solution, Zipf 's law was the only possi- 
ble outcome. 

Now we adapt and enrich the general framework pro- 
posed in [12 to the communicative context. As we shall 
see, Zipf's hypothesis can be interpreted in such a way 
that the system can be studied within the framework pro- 
posed in [12 . Moreover, the parameters that were arbi- 
trary in the general mathematical framework mentioned 
above can now be naturally interpreted in the commu- 
nicative framework as the key pieces of the mathematical 
statement of Zipf's hypothesis. 

Beyond the mathematical formalization of the com- 
municative conflict described by Zipf, we need another 
ingredient, pointed out -in a different context- in [14 , 
namely the active role played by the evolutionary path 
followed by the code. As it occurs with other systems 
growing out of equilibrium, such as scale-free networks 
[15] , we will consider the evolution of the communicative 
exchange under system's growth. Here the evolution- 
ary component is variationally introduced by minimizing 
the divergence between code configurations belonging to 
successive time steps. This minimal change follows the 
so-called Minimum Discrimination Information Princi- 
ple (henceforth MDIP), a general variational principle 
considered analogous to the Maximum Entropy Princi- 
ple [1^, from which statistical mechanics can be prop- 
erly formalized [13 [^. The MDIP states that, under 
changes in the constraints of the system, the most ex- 
pected probability distribution is the one minimizing the 
Kullback-Leibler divergence (also referred as to Kullhack- 
Leibler entropy or relative entropy) from the original one 
[T7] . Such a variational principle constrains the changes 
of the internal configurations of an statistical ensemble 
when the external conditions change in the same way that 
internal configurations of an statistical ensemble change 
when we introduce moment constraints in a Jaynesian 
formalism. In our context, this information theoretic 
functional assumes the role of a Lagrangian whose mini- 
mization along the process defines the possible ensemble 
configurations one can observe at a certain point of an 
evolutionary path. 

Using the MDIP and the framework provided in JT2\ , 
we provide a proof of unicity for the emergence of Zipf's 
law in evolving codes. We stress that no arbitrary as- 
sumptions are made on the nature of solutions. 

The remainder of the paper is structured as follows: 
In section II we rigorously define the communicative ten- 
sion intuitively defined by Zipf and explicitly character- 
ize the evolutionary process in terms of the mathematical 
statement of such a tension. In section III we apply the 
MDIP as the guiding, variational principle which ac- 
counts for the possible evolutionary paths of the code. 
Finally, we demonstrate that the consequences of the 
application of both the communicative tension and the 
MDIP account for the emergence of Zipf's law as the 
unique possible solution of the evolving code. In section 



IV we discuss the implications of our results. 



II. THE EVOLUTION OF THE 
COMMUNICATIVE SYSTEM 

In this section we mathematically define 1) the com- 
municative tension described by Zipf and 2) the evolution 
or growth of a given code subject to such a tension. We 
furthermore define the range of application of our formal- 
ism. As we shall see in section III, the proposal made in 
this section defines a framework whose key piece to work 
with is eq. (|6|. 

A. The explicit description of the communicative 
conflict 

The first task is to properly define the communicative 
tension between the coder and the decoder and how this 
tension is solved. Following the standard nomenclature 
used in studies of the evolution of communicating, au- 
tonomous agents [T9]-[2j, in our system there are two 
agents: the coder agent, P, encoding information from 
a set of external events, 1], and the decoder or external 
observer, which infers the behavior of Q through the code 
provided by the coder agent P. In this way, 

Q {mi, ...,mn} 

is the set of external events acting as the input alphabet, 
and 

S {si, 

is the set of signals or output alphabet. The coder mod- 
ule P -fig. ([l^)- is fully described by a matrix F{Xs\Xq), 
where Xq is a random variable taking values on the set 
Q following the probability measure p; being p{mk) the 
probability to have symbol m/c as the input in a given 
computation. Complementarily, Xg is a random variable 
taking values on S and following the probability distri- 
bution q which, for a given Si G 5, reads: 

^("^0 = ^p(m/e)P(5i|m/e), (1) 

k<n 

i.e., the probability to obtain Si as the output of a codi- 
fication. We assume that 

irrik i. 

i<n 

For the decoder agent inferring the input set from the 
output set with least effort, the best scenario is a one-to- 
one mapping between Q and S. In this case, P generates 
an unambiguous code, and no supplementary amount of 
information to successfully reconstruct Xq is required. 
However, from the coding device perspective, this coding 
has a high cost. In order to characterize this conflict. 
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let us properly formalize the above intuitive statement: 
The decoder agent wants to reconstruct through the 
intermediation of the coding performed by P. There- 
fore, the amount of bits needed by the decoder of Xg to 
unambiguously reconstruct Xq is 

i<n k<n 

which is the joint Shannon entropy or, simply, joint en- 
tropy of the two random variables X^^Xs [26]. From 
the codification process, the decoder receives H{Xs) bits, 
and thus, the remaining uncertainty it must face will be 



where 



H{Xn,Xs) - H{Xs) = H{Xn\X,), 



H{Xs) = -'^q{si)\ogq{si), 



(i.e, the entropy of the random variable Xs) and 
H{Xn\X,) = -^9(si)^P(mfc|si)logP(mfe|si), 

i<n /c<n 

the conditional entropy of the random variable X^ con- 
ditioned to the random variable Xg. The tension be- 
tween the coder and the decoder is solved by imposing a 
symmetric balance between its associated efforts -see fig. 
([l^)-, i.e.: The coder sends as many bits as the additional 
bits the observer needs to perfectly reconstruct X^: 



H{Xs) = H{Xn\Xs). 



(2) 



The above ansatz is the mathematical formulation of the 
symmetric balance between the efforts of the coder and 
the decoder. We will refer to this equation as the sym- 
metry condition and, as pointed out in [11 , it math- 
ematically describes how the communicative tension is 
solved by using a cooperative strategy between the coder 
and the decoder agents. It is worth noting that different 
equations sharing the same spirit were formerly proposed, 
within the framework of the so-called code-length game 
[To]. From eq. ([2|, we can state that: 

H{Xn,X,)=2H{X,). 

And knowing the classical inequalities 

H{Xn,X,)>H{Xn) 

HiXn\X,) = HiXn,X,) - H{X,) < HiXa), 

we reach a general relation between the informative rich- 
ness of the input variable Xq and the informative richness 
of the messages sent by the coder, constrained by eq. ([2|: 



-H{Xn)<H{Xs)<H{Xn). 



(3) 



The first relation becomes equality only in the case of 
P performing a deterministic codification process. The 



second relation becomes equality when the coding device 
performs completely random associations. It is clear that 
eqs. ([2| and ([3| alone cannot explain the emergence of 
Zipf 's law since one could tune the parameters of, say, an 
exponential distribution to reach the desired relation be- 
tween entropies. Therefore we need to introduce another 
ingredient to obtain Zipf 's law as the unique possible so- 
lution of our problem. 



B. Evolution 

The unicity in the solution is provided by the evolu- 
tion, which is now explicitly introduced -see fig ^p). Let 
us suppose that our communicative success grows over 
time, thereby increasing the number of input symbols 
that P can encode. Formally, this implies that the cardi- 
nality of the set Q defined above increases. We introduce 
this feature by defining a sequence of Q^s 1^(1), ft{k), ... 
satisfying an inclusive ordering, i.e., 

c n{2) c ... c n{k),..., 

which is introduced, without any loss of generality, as- 
suming that 



{mi,m2}. 



n{n - 1) = {mi, ...,mn-i}, 
n{n) {mi,...,mn-i,mn}. 

At time step n, P will be able to process the n symbols 
of Q{n).The elements mi, ...^rui^ ... are members of some 
infinite, countable set i.e., {\/i){Q{i) C fl). can 
be understood, using a thermodynamical metaphor, as a 
reservoir of information. Following this characterization, 
we say that for every set Q{i) there is a random variable 
taking values in Q{i) following the ordered proba- 
bility distribution Pi . Furthermore, we assume that exists 
a unique /i G (0, 1) such that (Ve > 0){3N) : (Vn > N), 



H{Xn{n)) 



\ogn 



< e. 



(4) 



This means that the entropy of the input set is un- 
bounded when its size increases, which implies that the 
potential input set Q acts as an infinite reservoir of infor- 
mation. The behavior of the output set at the stage n is 
described by a random variable Xs(n), which follows the 
ordered probability distribution g^, as defined in eq. ([T]), 
taking values on S{n) = {si,...,^^}. We observe that 
S{n) C tS(n + 1), defining a sequence tS(l), <S(/c), ... 
also ordered by inclusion. At every time step, the conse- 
quences of the symmetry condition -see eq. depicted 
in eq. ([3| are satisfied, which implies that the sequence 

H = H{X,{1)),H{X,{2)), H{X,{k)), ... 
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also satisfies the convergence ansatz made over the se- 
quence of normahzed entropies of the input -see eq. Q. 
The only difference is the value of the limit, v. The value 
of V can be bounded by using eqs ([3| and Q, thereby 
obtaining: 



-fji <y < jji 



(5) 



Therefore, in this case, by virtue of eqs. ([3|, Q and ([5|, 
the convergence condition for the normalized entropies 
of the sequence of random variables Xg (1 ),..., Xs(n), ... 
reads: exists a unique v G such that (Ve > 

0)(37V) : (Vn > N): 



H{Xs{n)) 



\ogn 



< e. 



(6) 



The above equation depicts two crucial facts in the forth- 
coming derivations: If the potential informative richness 
of the input set is unbounded, so is the informative rich- 
ness of the output set, under the constraints imposed by 
the symmetry condition -see eq. ([2|. 



III. THE EMERGENCE OF ZIPF'S LAW 
UNDER THE MDIP 




FIG. 2: Numerical simulation of the final distribution Qn (n — 
10^) obtained by constraining the growth process with i)the 
consequences of the symmetry of coding/decoding -see eq. 
Q- provided by eq. (|6| and ii) the application of the MDIP 
at every step of the growth process. Different convergence 
values are studied: a) = 0.2, b) = 0.3 and c) = 0.5. 
The dashed lines are the best fit interpolations, which give 
estimated exponents 7 = 1.06, 1.04 and 1.01, respectively (all 
with correlation coefficients r < —0.99). 



The MDIP is presented in this section as the varia- 
tional principle guiding the evolution of the code. As we 
shall see at the end of this section, the consequences of its 
application result in a proof of unicity for the emergence 
of Zipf 's law in evolving codes. 

A. The MDIP and its consequences for the 
evolution of codes 

The question is thus how the probability distribution 
qn evolves along the growth process. Under the MDIP 
we face a variational problem which is stated as follows: 
During the growth process, the most likely code at step 
n + 1 is the one minimizing the distance with respect to 
the code at step n, consistently with the MDIP. Fur- 
thermore, the evolution of the code must satisfy, along all 
the evolutionary steps, the symmetry condition depicted 
by eq. ([2|. The crucial contribution of the MDIP is 
that it naturally introduces the footprints of the path 
dependence imposed by evolution. Following the ther- 
modynamical metaphor, this variational principle acts, 
in our context, as a principle on energy minimization 
acting over the transitions of successive codes. Putting 
it formally, let 

D{qn\\qn+i)= V qn{si) log ^""^^'^ 

be the Kullback-Leibler Divergence of the distribution 
qn-\-i with respect the distribution qn [22^. Therefore, 



the MDIP is achieved by minimizing the following func- 
tional [17 : 

\i<n+l J 

We observe that this functional has a role equivalent 
to the one attributed to the Lagrangian function in a 
given continuous, differentiable system; therefore, the 
trajectories minimizing it will govern the evolution of the 
system. Furthermore, the symmetry condition on cod- 
ing/decoding -eq. imposes that the solutions must 
lie in the region defined by eq. (|6|. The minimum of C 
is found when qn-\-i satisfies: 

n r.^-/ hni^i) iff^<^ /yx 

^-+^('^^-\ 1-i iffz = n + l, 

being A the Lagrange multiplier, which is a positive, 
unique constant for all elements of the probability distri- 
bution ^n+i- We observe that, for A = 1, D{qn\\qn-\-i) = 
0, but, in this case, II{Xs{n)) = i^(Xs(n + iy), in contra- 
diction to the assumption provided by eq. (pi, according 
to which informative richness grows during the evolution- 
ary process. 

Now we want to find the asymptotic behavior of 
^n, n ^ 00 under the above justified conditions (|6| and 
The key feature we derive from the path dependency 
in the evolution imposed by the MDIP is that the fol- 
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lowing quotient 

(Vfc + j<n) f{k,k^j) = ^-^^^^ (8) 

does not depend on n. Therefore, along the evolutionary 
process, as soon as 

qn{Sk),qn{Sk+j) > 0, 

f{k,k-\-j) remains invariant. 



B. The Emergence of Zipf 's Law 

The asymptotic behavior of quotient / and, thus, the 
tail of g^, is strongly constrained by the entropy restric- 
tion provided by eq. (|6| [12 . As shall see, the key of the 
forthcoming derivations will be the convergence proper- 
ties of the normalized entropies of a given random vari- 
able X having n possible states whose (ordered) proba- 
bilities follow a power-law distribution function, namely 
g{si) oc . The explicit form of these entropies is: 



logn logn I 



1 17 log i 



i<n 



log 



(9) 



Consistently, Z^ is the normalization constant. 

The first observation is that it can be shown that the 
convergence properties of the Riemann ("-function on 



oo ^ 

c(7) = E 



strongly constrain the convergence properties of a given 
probability distribution Indeed, we find that, if 

(y6 > 0, n > m){3N) such that: 



(Vm > A^) /(m, m + 1) < 



m 



m + 1 



then (3C < oo G M+) such that (Vn)(i^(X,(n)) < C), 
which contradicts the assumptions of the problem, de- 
picted by eq. (|6|. Indeed, primarily, one can observe 
that the above statement implies that qn is dominated 
by a power-law having exponent 1 + (5, i.e. that qn de- 
cays faster than g^, defined as: 



-(1+5) 



^1+5 



where Zi_^§ is the normalization constant. Now, we write 
the explicit form of the entropy of Xg{n) ~ q'^ -to be 
written as H{X'g{n))- when n ^ oo by multiplying the 
expression derived in eq. ^ by logn: 



lim H{X'^{n)) 



1 + log z 

C(i + (5) ^ ^1+^ 



-log(C(l + (5)). 



We observe that all the elements of the above equation 
are finite constants, since 



7i 



logz 



+5 



< OO. 



Thus, having q'^ as defined above, 

lim H{X'^{n)) < oo. 

n^oo 

Therefore, during the growth process, due to the con- 
straint imposed by eq. (|6|, 



/(m, m + 1) > 



m + 1 



(1+5) 



(10) 



with 5 arbitrarily small, provided that n can increase 
unboundedly. Otherwise, its normalized entropy -see eq. 
([9|- will have, as an asymptotic value 



H{Xs{n)) 
logn 



^0, 



in contradiction to the assumption that > as depicted 
in eq. (|6|. 

Furthermore, we observe that, if {\/5 > 0, n > m){3N) 
such that 



(Vm > N) f{m, m + 1) > 



m 



m 



{1-5) 



(11) 



then 



n^oo logn 

again in contradiction to eq. (|6|, except in the extreme, 
pathological case where v = 1^ when the coding pro- 
cess is completely noisy. To see how we reach this latter 
point we observe that statement (11) implies that qn is 



not dominated by a power-law probability distribution q'^ 
having exponent 1 — (5, namely: 



r(i-^) 

Zi-5 



where Zi_s is the normalization constant. Putting ex- 
plicitly the expression of the normalized entropy -see eq. 
([9|- for the random variable X^(n), one obtains: 



lim ^iS^ = lim 



logn 



6{l-5) ^ \ogk 

k<n 



log 

oo I n^logn ^ k^~^ 



V 1-^A 1\ r 

hm log n - - I + 

n^oo log n \ ^ 

1, 



which is the desired result. Accordingly, since from eq. 
(|6| is generally different from 1, 



/(m, m + 1) < 



m + 1 



(1-5) 



(12) 
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Thus, combining eq. (10) and (12), we have shown 



that the asymptotic solution is bounded by the fohowing 
chain of inequalities: 



m + 1 



(1+5) 



< /(m, m + 1) < 



m + 1 



(1-5) 



The crucial step is that it can be shown that, if n ^ oo, 
we can set 

^ ^ 0. 

(The mathematical technicalities of this result can be 
found in [12 .) This implies, in turn, that, for n 1: 



/(m,m + 1) 



m 



m - 



1 



and, from the definition of / provided in eq. 
conclude that 



we 



qn{Sm) CX 



1 



thereby leading us to Zipf 's law as the unique asymptotic 
solution. 

In fig. ([2| we numerically explored the behavior of 
the rank probability distribution of signals belonging to 
a growing code under the assumption of symmetry in 
coding/decoding provided by eqs. ([2| and ([6|, and the 
MDIP whose consequences in the evolution of Qn are 
depicted in eq. ([t]). The outcome perfectly fits with 
the mathematical derivations, showing very well-defined 
power-laws with exponents close to 1 although the con- 
vergence values u diverge from 0.2 to 0.5. This numerical 
validation shows that the predicted asymptotic effects - 
i.e., the convergence of Qn to Zipf's law- are perfectly 
appreciated even in finite simulations where 10^ signals 
are at work. 

We end this section with a remark on the boundary 
conditions needed for the emergence of Zipf's law. In the 
section II B[ we imposed that the potential information 
richness of the source must be unbounded. Such a con- 
dition is mathematically stated by (|4|. We observe that, 
more than an assumption, equation (|4| is a boundary 
condition under which a growing code can (assymptoti- 
cally) exhibit Zipf's law [27^. In this way, since H{Xs{n)) 
has a linear relation with H{X^{n))^ the divergence of 
the latter implies the divergence of the former. And it 
is a required condition, since the entropy of a system 
exhibiting a power law with an exponent equal to 1 di- 
verges with n. Otherwise, exponents are higher, or other 
probability distributions can emerge. 



IV. DISCUSSION 

The results provided in our study define a general ra- 
tionale for the emergence of Zipf's law in the abundance 
of signals of evolving communication systems. The varia- 
tional approach taken here as a formal picture of the least 



effort hypothesis has two ingredients. First, starting from 
Zipf's conjecture, we reach a static symmetry equation 
to solve the communicative tension between coder and 
decoder. This is consistent with previous work, but re- 
veals itself insufficient to derive Zipf's law as the unique 
solution, for it is easy to check that static equations of 
the kind of eq. ([2| and ([3| have infinite arbitrary so- 
lutions, even in the asymptotic regime, due to the pos- 
sible parametrizations of the solutions. Secondly -and 
crucially- we consider that the code evolves through time, 
and that, consistently, there is a path dependence in its 
evolution, mathematically stated by imposing a varia- 
tional principle, the MDIP, between successive states 
of the code. It is only by imposing evolution (and thus, 
path dependence) that we reach Zipf's law as the only 
asymptotic solution. Therefore, the origin of the power- 
law with exponent 7 = — 1 derives from three comple- 
mentary, very general conditions: 

• The unbounded informative potential of the code, 

• the loss of information resulting from the symmetry 
condition, depicted in eq. ([2|, and 

• evolution, and its associated path dependency, vari- 
ationally imposed by the application of the MDIP 
over successive states of the evolution of the system. 

There is another, very interesting point, intimately tied 
to a code exhibiting Zipf's law and, more specially, the 
consequences of the symmetry condition, the mathemati- 
cal ansatz which abstractly encodes the Zipf's hypothesis 
of vocabulary balance: The presence of an inevitable am- 
biguity in the code. It is a common observation that nat- 
ural languages are ambiguous, namely, that linguistic ut- 
terances or parts of linguistic utterances can be assigned 
more than one interpretation. If the principle of least 
effort is at work, and thus there is a cooperative strategy 
between the coder and the decoder, then the presence 
of a certain amount of ambiguity is expected, provided 
that the speaker tends to assign more than one meaning 
to certain signals. Therefore, ambiguity is a byproduct of 
efficient communication rather than a fingerprint of poor 
communicative design. 

The presented framework is general, and rigorously 
demonstrates that Zipf's law is a natural outcome of 
a broad class of communication systems evolving under 
coding/decoding tensions. In other words, Zipf's law 
emerges in a system where the coder and decoder coe- 
volve under a general problem of energy minimization. 
The range of application to real-world phenomena, how- 
ever, must be contrasted with the validity of data, for 
it has been pointed out that many supposed power-law 
behaviors show deviations when the statistical analysis is 
performed accurately [24, 25 . It should be noted, how- 
ever, that a deviation of the predicted behavior need not 
be necessarily attributed to a failure of the framework. 
One should take into account that other constraints, such 
as general memory limitations, can play a role in shaping 
the final distribution. 
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